Tech News : ChatGPT Rolls Out Voice Mode

Written by: Paul |

In a reply to a user question on the X platform, Open AI’s CEO, Sam Altman, said that the alpha rollout of ChatGPT’s long-awaited ‘Voice Mode’ starts for Plus subscribers next week. 

What Is Voice Mode? 

As the name of the feature suggests, ChatGPT’s ‘Voice Mode’ allows users to interact with the AI by simply using spoken language instead of typing. To do this, it uses advanced speech recognition technology which can convert spoken input into text, and it employs text-to-speech (TTS) technology to vocalise responses. Voice Mode should enhance accessibility for users who prefer verbal communication or have difficulty typing, thereby making it a convenient option. To use Voice Mode, a device with a microphone and speaker is required, and specific commands or settings may be needed to activate and customise the experience.  

Competiton and Ready For Apple Integration 

With Voice Mode, OpenAI hopes to make interactions with ChatGPT more natural and user-friendly. It also needs Voice Mode to compete with Google Gemini’s voice feature, and to make ChatGPT ready for use when integrated with Apple devices and systems, as announced at this year’s WWDC. 

What’s Been The Hold-Up? 

Originally planned for late June, the rollout to OpenAI’s Alpha users (a select group of people who provide testing and feedback), was put back another month to improve the “model’s ability to detect and refuse certain content”. The feature has been awaited with great anticipation by many regular ChatGPT users and has been the subject of frequent questions on social media. It now seems likely that further delays (for testing) may be possible, despite Sam Altman’s announcement that Plus subscribers will be getting Voice Mode in a week. 

Omni and Wisper 

Voice Mode’s rollout also had to await the rollout of GPT-4o (“o” for “omni”), OpenAI’s newest model, which has been described as “a step towards much more natural human-computer interaction”. Omni accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs and can respond to audio inputs with response times that are close to those in human conversations – hence Omni’s importance for Voice Mode to operate well. In addition to Omni, Voice Mode will utilise a “neural net” called ‘Whisper’ that OpenAI says “approaches human level robustness and accuracy on English speech recognition”. 

What’s So Special About ChatGPT’s Voice Mode? 

Essentially, Voice Mode’s advantages are that it can enable “real-time, natural conversations with AI” and (as mentioned above) delivers text, audio, and image outputs. It can also talk in character voices, engage in interviews roleplay, and even help users to learn new languages. 

Business Uses 

The potential business uses for Voice Mode are many and could include automating customer support, enhancing virtual assistant functionalities for employees, creating interactive marketing campaigns, facilitating training, and onboarding processes, and improving accessibility for users with disabilities. 

What Does This Mean For Your Business? 

The long (and still awaited) introduction of ChatGPT’s Voice Mode represents a significant advancement for businesses, offering a new dimension of interaction with AI that could transform various operational aspects. For example, using it for customer support, Voice Mode could automate responses to customer enquiries, providing quick, accurate assistance and freeing up human resources for more complex issues. This could lead to enhanced customer satisfaction (and loyalty).

In internal operations, integrating ChatGPT as a virtual assistant could help streamline tasks for employees, improving productivity and efficiency by handling scheduling, reminders, and information retrieval through simple voice commands. The ability to create interactive marketing campaigns with voice interactions could also open up innovative ways to engage customers, offering personalised experiences and product recommendations that could boost conversion rates. Training and onboarding processes could also benefit, e.g. with Voice Mode providing interactive, voice-guided instructions that make learning more intuitive and effective for new employees. Also, Voice Mode’s advanced speech recognition and text-to-speech capabilities could (as mentioned above) improve accessibility, making digital interactions easier for individuals with disabilities and those who prefer verbal communication over typing. This inclusivity could enhance user experience and expand the reach of your business’s services and products. 

For OpenAI, the rollout of Voice Mode (hopefully soon) will be a major milestone that strengthens its position in the AI market. By enhancing ChatGPT with natural, real-time voice interactions, OpenAI can demonstrate its commitment to advancing AI capabilities and user experience. This feature sets ChatGPT apart from competitors (e.g. Google), showcasing the versatility and potential of OpenAI’s technology.

The impending integration with Apple devices will further solidify OpenAI’s reach and influence, making its AI tools more accessible and embedded in everyday technology. The introduction of ChatGPT’s Voice Mode, therefore, raises the competitive stakes in the AI industry. Competitors like Google, with its Gemini voice feature, will need to innovate rapidly to keep pace. The ability of ChatGPT to offer multimodal interactions, combining text, audio, and image, positions it as a more comprehensive and flexible tool compared to other AI systems. This puts pressure on competitors to enhance their offerings and develop similar or superior capabilities to maintain their market positions. 

In essence, integrating ChatGPT’s Voice Mode into your business operations could lead to greater efficiency, enhanced customer engagement, and improved accessibility, helping to position your company at the forefront of technological innovation and customer service excellence. For OpenAI, it marks a significant achievement and competitive advantage, while for competitors, it signifies a challenging new benchmark.